Simplest example for my IPython parallel use


In [1]:
import pandas as pd
import glob
folders = glob.glob('/Volumes/Data/ciss/opus/N*')

In [2]:
len(folders)


Out[2]:
490

In [3]:
def process_folder(folder):
    import glob
    from pysis.exceptions import ProcessError
    from pyciss import pipeline as p
#     done = glob.glob(folder+'/*.map.cal.cub')
#     if done:
#         return folder,True
    img_name = glob.glob(folder+'/*.LBL')
    try:
        map_name = p.calibrate_ciss(img_name[0])
    except ProcessError:
        return folder,False
    return folder,True

In [4]:
folders = pd.Series(folders)

Using pandas mostly for its data wrangling capabilities


In [5]:
SOI = folders[folders.str.contains('N146734')]
SOI


Out[5]:
0     /Volumes/Data/ciss/opus/N1467345444
1     /Volumes/Data/ciss/opus/N1467345503
2     /Volumes/Data/ciss/opus/N1467345562
3     /Volumes/Data/ciss/opus/N1467345621
4     /Volumes/Data/ciss/opus/N1467345680
5     /Volumes/Data/ciss/opus/N1467345739
6     /Volumes/Data/ciss/opus/N1467345798
7     /Volumes/Data/ciss/opus/N1467345857
8     /Volumes/Data/ciss/opus/N1467345916
9     /Volumes/Data/ciss/opus/N1467345975
10    /Volumes/Data/ciss/opus/N1467346034
11    /Volumes/Data/ciss/opus/N1467346093
12    /Volumes/Data/ciss/opus/N1467346152
13    /Volumes/Data/ciss/opus/N1467346211
14    /Volumes/Data/ciss/opus/N1467346270
15    /Volumes/Data/ciss/opus/N1467346329
16    /Volumes/Data/ciss/opus/N1467346388
17    /Volumes/Data/ciss/opus/N1467346447
18    /Volumes/Data/ciss/opus/N1467346506
19    /Volumes/Data/ciss/opus/N1467346565
20    /Volumes/Data/ciss/opus/N1467346624
21    /Volumes/Data/ciss/opus/N1467347210
22    /Volumes/Data/ciss/opus/N1467347249
23    /Volumes/Data/ciss/opus/N1467347445
24    /Volumes/Data/ciss/opus/N1467347504
dtype: object

Before this, you need to launch some ipengines with the "Cluster" tab in the IPython notebook dashboard.


In [6]:
from IPython.parallel import Client
c = Client()

Getting the load balanced view, because I don't care where which file is processed. (Most embarassing parallel possible)


In [7]:
lbview = c.load_balanced_view()
  • Using map_async so that I can do other stuff, while engines are working.
  • But results is my handle to what is going on in the background.

In [8]:
results = lbview.map_async(process_folder, SOI)

Using a widget called IntProgress for simple progress bar.


In [9]:
from IPython.html.widgets import IntProgress
from IPython.display import display


:0: FutureWarning: IPython widgets are experimental and may change in the future.

In [10]:
from time import sleep
prog = IntProgress(min=0, max=len(SOI)+1)
display(prog)
while not results.ready():
    prog.value = results.progress
    sleep(5)

Put this just in a tool module for easy calling:


In [ ]:
from IPython.html.widgets import IntProgress
from IPython.display import display
from time import sleep
def show_progress(results, worklist):
    prog = IntProgress(min=0, max=len(worklist))
    display(prog)
    while not results.ready():
        prog.value = results.progress
        sleep(5)
        
show_progress(results, SOI)